DESCRIBE
Overview
The DESCRIBE function computes several descriptive statistics for a dataset in a single operation, providing a comprehensive summary of the data’s distribution. It returns seven key metrics: the number of observations, minimum value, maximum value, arithmetic mean, variance, skewness, and kurtosis.
This implementation uses the scipy.stats.describe function from the SciPy library, a fundamental tool for scientific computing in Python. The underlying source code is available in the SciPy GitHub repository.
Skewness measures the asymmetry of a distribution. A value of zero indicates a symmetric distribution, positive values indicate a longer right tail, and negative values indicate a longer left tail. Kurtosis (Fisher’s definition) measures the “tailedness” of a distribution relative to a normal distribution. The function normalizes kurtosis so that a normal distribution has a kurtosis of zero; positive values indicate heavier tails (leptokurtic), while negative values indicate lighter tails (platykurtic).
The ddof parameter (delta degrees of freedom) adjusts the divisor used in variance calculations. With ddof=0, the function computes the population variance (dividing by n); with ddof=1, it computes the sample variance (dividing by n-1), which provides an unbiased estimate when working with samples. The bias parameter controls whether skewness and kurtosis calculations apply a correction for statistical bias—setting bias=False applies the correction.
The function flattens the input data and filters out non-numeric or non-finite values before computing statistics, requiring at least two valid numeric values to produce results.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=DESCRIBE(data, ddof, bias)
data(list[list], required): Table of numeric values to analyze.ddof(int, optional, default: 0): Delta degrees of freedom for variance calculation.bias(bool, optional, default: false): If true, calculations are corrected for statistical bias.
Returns (list[list]): 2D list [[nobs, min, max, mean, var, skew, kurt]], or error string.
Examples
Example 1: Basic statistics with default parameters
Inputs:
| data | ddof | bias | ||
|---|---|---|---|---|
| 1 | 2 | 3 | 0 | false |
| 4 | 5 | 6 |
Excel formula:
=DESCRIBE({1,2,3;4,5,6}, 0, FALSE)
Expected output:
| Result | ||||||
|---|---|---|---|---|---|---|
| 6 | 1 | 6 | 3.5 | 2.9167 | 0 | -1.2 |
Example 2: Statistics with ddof=1 for sample variance
Inputs:
| data | ddof | bias | ||
|---|---|---|---|---|
| 1 | 2 | 3 | 1 | false |
| 4 | 5 | 6 |
Excel formula:
=DESCRIBE({1,2,3;4,5,6}, 1, FALSE)
Expected output:
| Result | ||||||
|---|---|---|---|---|---|---|
| 6 | 1 | 6 | 3.5 | 3.5 | 0 | -1.2 |
Example 3: Statistics with bias correction enabled
Inputs:
| data | ddof | bias | ||
|---|---|---|---|---|
| 1 | 2 | 3 | 0 | true |
| 4 | 5 | 6 |
Excel formula:
=DESCRIBE({1,2,3;4,5,6}, 0, TRUE)
Expected output:
| Result | ||||||
|---|---|---|---|---|---|---|
| 6 | 1 | 6 | 3.5 | 2.9167 | 0 | -1.2686 |
Example 4: Statistics for larger values dataset
Inputs:
| data | ddof | bias | ||
|---|---|---|---|---|
| 10 | 20 | 30 | 0 | false |
| 40 | 50 | 60 |
Excel formula:
=DESCRIBE({10,20,30;40,50,60}, 0, FALSE)
Expected output:
| Result | ||||||
|---|---|---|---|---|---|---|
| 6 | 10 | 60 | 35 | 291.6667 | 0 | -1.2 |
Python Code
import math
from scipy.stats import describe as scipy_describe
def describe(data, ddof=0, bias=False):
"""
Compute descriptive statistics using scipy.stats.describe module.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.describe.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): Table of numeric values to analyze.
ddof (int, optional): Delta degrees of freedom for variance calculation. Default is 0.
bias (bool, optional): If true, calculations are corrected for statistical bias. Default is False.
Returns:
list[list]: 2D list [[nobs, min, max, mean, var, skew, kurt]], or error string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
data = to2d(data)
if not isinstance(data, list) or not all(isinstance(row, list) for row in data):
return "Invalid input: data must be a 2D list."
flat = []
for row in data:
for x in row:
try:
val = float(x)
if math.isfinite(val):
flat.append(val)
except (TypeError, ValueError):
continue
if len(flat) < 2:
return "Invalid input: data must contain at least two numeric values."
if not isinstance(ddof, (int, float)) or int(ddof) != ddof or ddof < 0:
return "Invalid input: ddof must be a non-negative integer."
if not isinstance(bias, bool):
return "Invalid input: bias must be a boolean."
try:
res = scipy_describe(flat, ddof=int(ddof), bias=bias)
except Exception as e:
return f"scipy.stats.describe error: {e}"
out = [
int(res.nobs),
float(res.minmax[0]),
float(res.minmax[1]),
float(res.mean),
float(res.variance),
float(res.skewness),
float(res.kurtosis),
]
return [out]